cuDNN support cleanup #15812

DickJC123 · 2019-08-09T01:57:55Z

Description

Background: Now that MXNet 1.5 no longer compiles against cuDNN v6 and earlier, any cuDNN code that special-cases those earlier versions is needlessly cluttering the codebase. PR #15449 introduced the STATIC_ASSERT_CUDNN_VERSION_GE() macro as a way to document and enforce the requirements of the codebase locally in the files that create those dependencies.

This PR removes code that needlessly adapts operators to cuDNN v6 (and earlier) APIs and uses STATIC_ASSERT_CUDNN_VERSION_GE() to enforce the restriction to operate only against cuDNN v7 and later. This cleanup helps create a concise and understandable codebase able to accept further feature improvements.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
[X ] Changes are complete (i.e. I finished coding on this PR)
[X ] All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
[X ] Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
[X ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

marcoabreu · 2019-08-09T05:15:43Z

Great to see this cleanup!

While I understand the intention of the assert, I feel it's a bit obsolete and makes the code a bit "cluttered" if you understand what I mean. Couldn't we have the assertion in a central place (aka on-init) instead of re-asserting the same thing over and over? The minimum CuDNN version is 7 and thus we treat it as default. That might make the code a bit more cleaner. But I don't have strong feelings about that tbh.

DickJC123 · 2019-08-09T17:58:51Z

I did consider the 'single assert' approach you mentioned. I ultimately advocated sprinkling a handful of asserts within the files having the actual dependencies. See #15449 (comment). We now have 17 assert lines. I don't think every cuDNN operator impl needs to state its dependency, since some are fine with very old cuDNN versions. How about as we advance the cuDNN support level, we clean out the old asserts that mention earlier versions that aren't out 'in the wild' anymore?

Since you say you don't feel strongly, I prefer to leave the asserts as is. We can always move toward fewer asserts going forward. I notice now the stats on this PR- a net of about 750 lines cleaned out!

ptrendx · 2019-08-09T18:44:06Z

src/operator/nn/cudnn/cudnn_algoreg-inl.h

@@ -44,6 +44,10 @@ namespace op {
 */
 template <typename CuDNNAlgoType>
 class CuDNNAlgo {
+  // Compile guard needed to get past non-nvcc compilation of cudnn_algoreg.cc.


I've pushed a fix that removes the compile guard. Before the fix and without the guard, the following step of the compile failed:

g++ ... -DMSHADOW_USE_CUDNN=1 ... -c cudnn_algoreg.cc

The problem is that the cudnn_algoreg-inl.h and cudnn_algoreg.cc files look at MXNET_USE_CUDNN, not MSHADOW_USE_CUDNN. The header file ./include/mxnet/libinfo.h has the vital missing line that equates the two:

#define MXNET_USE_CUDNN MSHADOW_USE_CUDNN

So this was a case of not adhering to the rule 'include what you use.' The header file ./src/common/cuda_utils.h was referring to MXNET_USE_CUDNN, but not including any header file that would ensure libinfo.h was included. Now fixed.

Now that MSHADOW is part of MXNET and no longer a submodule, is there a benefit to consolidating the dual naming as in {MSHADOW,MXNET}_USE_CUDNN?

ptrendx · 2019-08-09T18:45:03Z

src/operator/nn/cudnn/cudnn_batch_norm-inl.h

@@ -44,6 +44,9 @@ enum CuDNNBatchNormOpAuxiliary {kMovingMean, kMovingInvVar};
 #if defined(__CUDACC__)
 template<typename DType>
 class CuDNNBatchNormOp {
+  // This file was formerly included in batch_norm.cu only when CUDNN_MAJOR >= 5


This comment may be useful for review of this PR, but I don't think it makes sense to have it going forward.

I can take this out. Also explanations of what features of cuDNN are being used that require the assert, as you mention below.

ptrendx · 2019-08-09T18:49:52Z

I'm not sure how I feel about the comments like "This file uses X and Y feature of cudnn 7" - I don't think it is actually useful for maintenance.

marcoabreu · 2019-08-09T19:55:16Z

Sounds good to me, thanks for the explanation!

Yeah it's certainly a great cleanup, no doubt about that :)

KellenSunderland · 2019-08-12T21:03:41Z

src/operator/nn/cudnn/cudnn_convolution-inl.h

+#if CUDNN_VERSION >= 7200
+    if (GetEnvAllowTensorCore() && GetEnvAllowTensorCoreConversion() &&
+        (DataType<DType>::kFlag != kFloat16))
+      math_type = CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION;


Non-blocking for this PR, but this change reminded me that I believe there's a bug here. IIRC the problem was that math_type can be reset but auto-tuning. Issue I opened a while ago that I haven't had time to follow up on:

#14684

"Auto-tuning is overwriting the math mode at convolution tuning time. Probably the right thing to do when implementing TCs but it's preventing the conversion math type from being used. We'll have to think about the long-term fix for this, but I've currently commented out the math type reset locally and I'm trying to verify this cudnn feature provides a significant speedup before moving forward. "

KellenSunderland

LGTM. Nit that we could remove the comment already highlighted, but the PR LGTM.

* Remove old cudnn support (<v7). * Simplify conv impl selection. * Remove comments justifying STATIC_ASSERT_CUDNN_VERSION_GE.

DickJC123 added 2 commits August 8, 2019 17:28

Remove old cudnn support (<v7).

55d08a2

Simplify conv impl selection.

fb53bbb

DickJC123 changed the title ~~Cudnn support cleanup~~ cuDNN support cleanup Aug 9, 2019

DickJC123 requested review from KellenSunderland and ptrendx August 9, 2019 18:06

ptrendx reviewed Aug 9, 2019

View reviewed changes

KellenSunderland reviewed Aug 12, 2019

View reviewed changes

KellenSunderland approved these changes Aug 12, 2019

View reviewed changes

Remove comments justifying STATIC_ASSERT_CUDNN_VERSION_GE.

d9dc7e5

KellenSunderland approved these changes Aug 12, 2019

View reviewed changes

larroy approved these changes Aug 12, 2019

View reviewed changes

KellenSunderland merged commit c81535c into apache:master Aug 13, 2019

anirudhacharya pushed a commit to anirudhacharya/mxnet that referenced this pull request Aug 20, 2019

cuDNN support cleanup (apache#15812)

0271b3e

* Remove old cudnn support (<v7). * Simplify conv impl selection. * Remove comments justifying STATIC_ASSERT_CUDNN_VERSION_GE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuDNN support cleanup #15812

cuDNN support cleanup #15812

DickJC123 commented Aug 9, 2019

marcoabreu commented Aug 9, 2019

DickJC123 commented Aug 9, 2019 •

edited

Loading

ptrendx Aug 9, 2019

DickJC123 Aug 12, 2019

ptrendx Aug 9, 2019

DickJC123 Aug 9, 2019

ptrendx commented Aug 9, 2019

marcoabreu commented Aug 9, 2019

KellenSunderland Aug 12, 2019

KellenSunderland left a comment

cuDNN support cleanup #15812

cuDNN support cleanup #15812

Conversation

DickJC123 commented Aug 9, 2019

Description

Checklist

Essentials

Changes

Comments

marcoabreu commented Aug 9, 2019

DickJC123 commented Aug 9, 2019 • edited Loading

ptrendx Aug 9, 2019

Choose a reason for hiding this comment

DickJC123 Aug 12, 2019

Choose a reason for hiding this comment

ptrendx Aug 9, 2019

Choose a reason for hiding this comment

DickJC123 Aug 9, 2019

Choose a reason for hiding this comment

ptrendx commented Aug 9, 2019

marcoabreu commented Aug 9, 2019

KellenSunderland Aug 12, 2019

Choose a reason for hiding this comment

KellenSunderland left a comment

Choose a reason for hiding this comment

DickJC123 commented Aug 9, 2019 •

edited

Loading